{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "30815a85",
   "metadata": {},
   "source": [
    "# LLM Pydantic Program - NVIDIA"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "311e16cb",
   "metadata": {},
   "source": [
    "This guide shows you how to generate structured data with our `LLMTextCompletionProgram`. Given an LLM as well as an output Pydantic class, generate a structured Pydantic object.\n",
    "\n",
    "In terms of the target object, you can choose to directly specify `output_cls`, or specify a `PydanticOutputParser` or any other BaseOutputParser that generates a Pydantic object.\n",
    "\n",
    "in the examples below, we show you different ways of extracting into the `Album` object (which can contain a list of Song objects)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e0611198",
   "metadata": {},
   "source": [
    "## Extract into `Album` class\n",
    "\n",
    "This is a simple example of parsing an output into an `Album` schema, which can contain multiple songs.\n",
    "\n",
    "Just pass `Album` into the `output_cls` property on initialization of the `LLMTextCompletionProgram`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "511a8171",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install llama-index-readers-file llama-index-embeddings-nvidia llama-index-llms-nvidia"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b029b7e6",
   "metadata": {},
   "outputs": [],
   "source": [
    "import getpass\n",
    "import os\n",
    "\n",
    "# del os.environ['NVIDIA_API_KEY']  ## delete key and reset\n",
    "if os.environ.get(\"NVIDIA_API_KEY\", \"\").startswith(\"nvapi-\"):\n",
    "    print(\"Valid NVIDIA_API_KEY already in environment. Delete to reset\")\n",
    "else:\n",
    "    nvapi_key = getpass.getpass(\"NVAPI Key (starts with nvapi-): \")\n",
    "    assert nvapi_key.startswith(\n",
    "        \"nvapi-\"\n",
    "    ), f\"{nvapi_key[:5]}... is not a valid key\"\n",
    "    os.environ[\"NVIDIA_API_KEY\"] = nvapi_key"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f7a83b49-5c34-45d5-8cf4-62f348fb1299",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pydantic import BaseModel\n",
    "from typing import List\n",
    "from llama_index.core import Settings\n",
    "from llama_index.llms.nvidia import NVIDIA\n",
    "from llama_index.embeddings.nvidia import NVIDIAEmbedding\n",
    "from llama_index.core.program import LLMTextCompletionProgram\n",
    "from llama_index.core.program import FunctionCallingProgram"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4e4fc4b9",
   "metadata": {},
   "outputs": [],
   "source": [
    "llm = NVIDIA()\n",
    "\n",
    "embedder = NVIDIAEmbedding(model=\"NV-Embed-QA\", truncate=\"END\")\n",
    "Settings.embed_model = embedder\n",
    "Settings.llm = llm"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8d92e739",
   "metadata": {},
   "outputs": [],
   "source": [
    "class Song(BaseModel):\n",
    "    \"\"\"Data model for a song.\"\"\"\n",
    "\n",
    "    title: str\n",
    "    length_seconds: int\n",
    "\n",
    "\n",
    "class Album(BaseModel):\n",
    "    \"\"\"Data model for an album.\"\"\"\n",
    "\n",
    "    name: str\n",
    "    artist: str\n",
    "    songs: List[Song]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "46c2d509",
   "metadata": {},
   "outputs": [],
   "source": [
    "prompt_template_str = \"\"\"\\\n",
    "Generate an example album, with an artist and a list of songs. \\\n",
    "Using the movie {movie_name} as inspiration.\\\n",
    "\"\"\"\n",
    "program = LLMTextCompletionProgram.from_defaults(\n",
    "    output_cls=Album,\n",
    "    prompt_template_str=prompt_template_str,\n",
    "    verbose=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "498370f4",
   "metadata": {},
   "source": [
    "Run program to get structured output.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ca490bf8",
   "metadata": {},
   "outputs": [],
   "source": [
    "output = program(movie_name=\"The Shining\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "40cce83f",
   "metadata": {},
   "source": [
    "The output is a valid Pydantic object that we can then use to call functions/APIs. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "53934d3d",
   "metadata": {},
   "outputs": [],
   "source": [
    "output"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6401ab8d",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.output_parsers import PydanticOutputParser\n",
    "\n",
    "program = LLMTextCompletionProgram.from_defaults(\n",
    "    output_parser=PydanticOutputParser(output_cls=Album),\n",
    "    prompt_template_str=prompt_template_str,\n",
    "    verbose=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1adc5b2b",
   "metadata": {},
   "outputs": [],
   "source": [
    "output = program(movie_name=\"Lord of the Rings\")\n",
    "output"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a41391d9",
   "metadata": {},
   "source": [
    "## Define a Custom Output Parser\n",
    "\n",
    "Sometimes you may want to parse an output your own way into a JSON object. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "60b7b669",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.output_parsers import BaseOutputParser\n",
    "\n",
    "\n",
    "class CustomAlbumOutputParser(BaseOutputParser):\n",
    "    \"\"\"Custom Album output parser.\n",
    "\n",
    "    Assume first line is name and artist.\n",
    "\n",
    "    Assume each subsequent line is the song.\n",
    "\n",
    "    \"\"\"\n",
    "\n",
    "    def __init__(self, verbose: bool = False):\n",
    "        self.verbose = verbose\n",
    "\n",
    "    def parse(self, output: str) -> Album:\n",
    "        \"\"\"Parse output.\"\"\"\n",
    "        if self.verbose:\n",
    "            print(f\"> Raw output: {output}\")\n",
    "        lines = output.split(\"\\n\")\n",
    "        lines = list(filter(None, (line.strip() for line in lines)))\n",
    "        name, artist = lines[1].split(\",\")\n",
    "        songs = []\n",
    "        for i in range(2, len(lines)):\n",
    "            title, length_seconds = lines[i].split(\",\")\n",
    "            songs.append(Song(title=title, length_seconds=length_seconds))\n",
    "\n",
    "        return Album(name=name, artist=artist, songs=songs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e165a0f9",
   "metadata": {},
   "outputs": [],
   "source": [
    "prompt_template_str = \"\"\"\\\n",
    "Generate an example album, with an artist and a list of songs. \\\n",
    "Using the movie {movie_name} as inspiration.\\\n",
    "\n",
    "Return answer in following format.\n",
    "The first line is:\n",
    "<album_name>, <album_artist>\n",
    "Every subsequent line is a song with format:\n",
    "<song_title>, <song_length_in_seconds>\n",
    "\n",
    "\"\"\"\n",
    "program = LLMTextCompletionProgram.from_defaults(\n",
    "    output_parser=CustomAlbumOutputParser(verbose=True),\n",
    "    output_cls=Album,\n",
    "    prompt_template_str=prompt_template_str,\n",
    "    verbose=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7a743006",
   "metadata": {},
   "outputs": [],
   "source": [
    "output = program(movie_name=\"The Dark Knight\")\n",
    "print(output)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ade8b979",
   "metadata": {},
   "source": [
    "# Function Calling Program for Structured Extraction\n",
    "\n",
    "This guide shows you how to do structured data extraction with our `FunctionCallingProgram`. Given a function-calling LLM as well as an output Pydantic class, generate a structured Pydantic object.\n",
    "\n",
    "in the examples below, we show you different ways of extracting into the `Album` object (which can contain a list of Song objects).\n",
    "\n",
    "**NOTE**: The `FunctionCallingProgram` only works with LLMs that natively support function calling, by inserting the schema of the Pydantic object as the \"tool parameters\" for a tool. For all other LLMs, please use our `LLMTextCompletionProgram`, which will directly prompt the model through text to get back a structured output."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6311f7ae",
   "metadata": {},
   "source": [
    "### Without docstring in Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fd22dce2",
   "metadata": {},
   "outputs": [],
   "source": [
    "llm = NVIDIA(model=\"meta/llama-3.1-8b-instruct\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "42053ea8-2580-4639-9dcf-566e8427c44e",
   "metadata": {},
   "outputs": [],
   "source": [
    "class Song(BaseModel):\n",
    "    title: str\n",
    "    length_seconds: int\n",
    "\n",
    "\n",
    "class Album(BaseModel):\n",
    "    name: str\n",
    "    artist: str\n",
    "    songs: List[Song]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4afff44e-a746-4b9f-85a9-72058bcdd29f",
   "metadata": {},
   "source": [
    "Define pydantic program"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fe756697-c299-4f9a-a108-944b6693f824",
   "metadata": {},
   "outputs": [],
   "source": [
    "prompt_template_str = \"\"\"\\\n",
    "Generate an example album, with an artist and a list of songs. \\\n",
    "Using the movie {movie_name} as inspiration.\\\n",
    "\"\"\"\n",
    "\n",
    "program = FunctionCallingProgram.from_defaults(\n",
    "    output_cls=Album,\n",
    "    prompt_template_str=prompt_template_str,\n",
    "    verbose=True,\n",
    "    llm=llm,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b7be01dc-433e-4485-bab0-36a04c3afbcb",
   "metadata": {},
   "source": [
    "Run program to get structured output.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "25d02228-2907-4810-932e-83ec9fc71f6b",
   "metadata": {},
   "outputs": [],
   "source": [
    "output = program(\n",
    "    movie_name=\"The Shining\", description=\"Data model for an album.\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4c2af9a5",
   "metadata": {},
   "source": [
    "### With docstring in Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "35c01bec",
   "metadata": {},
   "outputs": [],
   "source": [
    "class Song(BaseModel):\n",
    "    \"\"\"Data model for a song.\"\"\"\n",
    "\n",
    "    title: str\n",
    "    length_seconds: int\n",
    "\n",
    "\n",
    "class Album(BaseModel):\n",
    "    \"\"\"Data model for an album.\"\"\"\n",
    "\n",
    "    name: str\n",
    "    artist: str\n",
    "    songs: List[Song]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "22268e2a",
   "metadata": {},
   "outputs": [],
   "source": [
    "prompt_template_str = \"\"\"\\\n",
    "Generate an example album, with an artist and a list of songs. \\\n",
    "Using the movie {movie_name} as inspiration.\\\n",
    "\"\"\"\n",
    "program = FunctionCallingProgram.from_defaults(\n",
    "    output_cls=Album,\n",
    "    prompt_template_str=prompt_template_str,\n",
    "    verbose=True,\n",
    "    llm=llm,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9411d0f1",
   "metadata": {},
   "source": [
    "Run program to get structured output.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "066e9c2d",
   "metadata": {},
   "outputs": [],
   "source": [
    "output = program(movie_name=\"The Shining\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "27ec0777-28d5-494b-b419-daf6bce2b20e",
   "metadata": {},
   "source": [
    "The output is a valid Pydantic object that we can then use to call functions/APIs. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3e51bcf4-e7df-47b9-b380-8e5b900a31e1",
   "metadata": {},
   "outputs": [],
   "source": [
    "output"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9eaa7c0f",
   "metadata": {},
   "source": [
    "# Langchain Output Parsing"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "62796f28",
   "metadata": {},
   "source": [
    "Download Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dca92bdc",
   "metadata": {},
   "outputs": [],
   "source": [
    "!mkdir -p 'data/paul_graham/'\n",
    "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "92652c87",
   "metadata": {},
   "source": [
    "#### Load documents, build the VectorStoreIndex"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2b4cc4c5",
   "metadata": {},
   "outputs": [],
   "source": [
    "import logging\n",
    "import sys\n",
    "\n",
    "logging.basicConfig(stream=sys.stdout, level=logging.INFO)\n",
    "logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))\n",
    "\n",
    "from llama_index.core import VectorStoreIndex, SimpleDirectoryReader\n",
    "from IPython.display import Markdown, display"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f102b812",
   "metadata": {},
   "outputs": [],
   "source": [
    "# load documents\n",
    "documents = SimpleDirectoryReader(\"./data/paul_graham/\").load_data()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "704b9386",
   "metadata": {},
   "outputs": [],
   "source": [
    "index = VectorStoreIndex.from_documents(documents, chunk_size=512)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8d38ce4e",
   "metadata": {},
   "source": [
    "#### Define Query + Langchain Output Parser"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c8c0beb7",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.output_parsers import LangchainOutputParser\n",
    "from langchain.output_parsers import StructuredOutputParser, ResponseSchema"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a2cc3956",
   "metadata": {},
   "source": [
    "**Define custom QA and Refine Prompts**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d1b20fec",
   "metadata": {},
   "outputs": [],
   "source": [
    "response_schemas = [\n",
    "    ResponseSchema(\n",
    "        name=\"Education\",\n",
    "        description=(\n",
    "            \"Describes the author's educational experience/background.\"\n",
    "        ),\n",
    "    ),\n",
    "    ResponseSchema(\n",
    "        name=\"Work\",\n",
    "        description=\"Describes the author's work experience/background.\",\n",
    "    ),\n",
    "]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3c475d87",
   "metadata": {},
   "outputs": [],
   "source": [
    "lc_output_parser = StructuredOutputParser.from_response_schemas(\n",
    "    response_schemas\n",
    ")\n",
    "output_parser = LangchainOutputParser(lc_output_parser)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cc2b558d",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.prompts.default_prompts import (\n",
    "    DEFAULT_TEXT_QA_PROMPT_TMPL,\n",
    ")\n",
    "\n",
    "# take a look at the new QA template!\n",
    "fmt_qa_tmpl = output_parser.format(DEFAULT_TEXT_QA_PROMPT_TMPL)\n",
    "print(fmt_qa_tmpl)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e02bf2bc",
   "metadata": {},
   "source": [
    "#### Query Index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ff44ad90",
   "metadata": {},
   "outputs": [],
   "source": [
    "query_engine = index.as_query_engine(\n",
    "    llm=llm,\n",
    ")\n",
    "response = query_engine.query(\n",
    "    \"What are a few things the author did growing up?\",\n",
    ")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "llama-index-vs8PXMh0-py3.11",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
